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ABSTRACT 




This thesis develops and explores tne graphical analysis 
of multivariate data sets through tne use of a Draftsman 
technique of scatter plot displays. These plot displays are 
useful for determining associations and relationships 
between variables in order to promote an understanding of 
the characteristics of the data in exploratory and descrip- 
tive applications. General graphical enhancement techniques 
such as jittering and transformations are discussed and 
incorporated in the development of a computer program which 
produces Draftsman displays. A technical description of the 
Draftsman computer program is presented, and user implemen- 
tation procedures discussed. An analysis is conducted on 
two varied sets of data to demonstrate the versatility and 
utility of the Draftsman display technique for exploring 
data structures. 
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I. INTRO DOCTIOH 



A. MOTIVATION 

Recent advances in computer Hardware and sofware capa- 
bilities have made available to a larger number of users 
powerful diagnostic and analytical tools for exploring data. 
These same advances however are responsible for a tremendous 
increase in the amount of data produced and available for 
analysis. Contrary to mathematical intuition, the avail- 
ability of more data availanle does not always lead to 
greater precision in subsequent analysis. Often the 
increased amount of data confounds the analysis by overbur- 
dening our ability to process the information in a timely 
and understandable fashion. 

Graphical displays are a method of visually portraying 
vast amounts of qualitative information. The primary 
benefit of graphical techniques is that the human eye-brain 
system has a powerful information processing capability. By 
maximizing our visual capability to process properly 
displayed data, we can rapidly summarize information, focus 
on salient features, discern abberations, and extract 
details of interest from a data set. 

B. SCOPE 

The purpose of a Draftsman display is to use the visual 
impact of an array of two dimensional scatter plots to 
analyze multivariate data. This can be accomplished by 
arranging an exhaustive series of plots consisting of all 
paired variables. This enables the analyst to observe the 
influence of each variable on every other variable in the 
data set. 
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The concept of using two dimensional scatter piots to 
display higher dimensional data structures is discussed in a 
text by Chambers, et al. [Ref. 1], The ideas from that 
text served as the foundation for tne development of an 
interactive computer program to construct Draftsman 
displays. The additional features of enhancing scatter 
plots such as the jittering of discrete data values and 
transforming variables can also be applied to this multidi- 
mensional display procedure. The full considerations that 
went into the program development as well as user implemen- 
tation procedures is amplified in later chapters. 

The purpose of this thesis is to integrate tne graphical 
concepts of scatter plots, jittering, and transforming vari- 
ables into a Draftsman display. Although written in A 
Programming Language (APL) , little if any knowledge of this 
language is reguired to successfully utilize the program. 

This thesis has been written in three major segments in 
order to appeal to the widest audience possible. The first 
segment, composed of chapters II and III deals with the 
general concepts of graphical metnods and user instructions 
reguired to invoke the Draftsman display program. The 
second segment, comprised of chapter IV and Appendix A, is 
aimed at those readers interested in the technical details 
and Draftsman program documentation. The final segment, 
found in chapters V and VI, contains a stepwise analysis of 
two varied forms of data to demonstrate potential applica- 
tions of this procedure in exploratory data analysis. 

The graphs used in this paper were produced by an exper- 
imental APL package GRAFSTAT, which the Naval Postgraduate 
School is using under a test agreement with the IBM Watson 
Research Center, Yorktown Heights, New York. We are 
grateful to Dr. P.D. Welch and Dr. P. Heidelberger for 
making GRAFSTAT available to us. 
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II. graphical TECHNIQUES 



A. DATA DISPLAYS 

A variety of graphical methods sucn as box plots, nisto- 
grams, stem and leaf displays and scatter plots are avail- 
able to explore relationships which may exist between 
variables of a data set. The scatter plot is perhaps one of 
the most powerful graphical methods for displaying bivariate 
data. The foremost feature of the scatter plot is that all 
of the data of interest is readily displayed for visual 
interpretation. In addition, the simplicity of construc- 
tion, compactness of the display, and adaptability to other 
graphical enhancement techniques, contribute to the power of 
this display. 

In contrast, numerical summaries may reflect correlation 
but tell little about clustering, patterns, or other rela- 
tionships which might be present. This is particularly true 
of larger data sets consisting of more than twenty observa- 
tions and moce than two variables. In these larger data 
sets, the sheer volume of data points to be compared makes 
interpretations a tedious and time consuming process. 

Figure 2.1 is a scatter plot of weight versus engine 
displacement for 106 different models of cars produced 
during 1S83 [Ref. 2 : pp. 320-356 ]. A numerical summary might 
readily impart the fact that an increase in car weight is 
associated with an increase in engine size. The scatter 
plot however rapidly makes apparent some other interesting 
features. We can see that the observations consist of two 
distinct groupings. For vehicles under 3,000 lbs there is a 
strong positive linear dependency between weight and 
displacement. For the heavier vehicles over 3,000 lbs. 
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increasing weight still tends to be correlated with large 
displacements, though more dispersed in form. A numerical 
summary would not so easily reveal taese relationships. 
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Figure 2.1 Basic Scatter Plot. 

For many applications, our interest may extend beyond 
bivariate data sets tc larger multidimensional sets. As in 
the bivariate case, scatter plots may also be used to graph- 
ically display multivariate data sets. An exhaustive series 
of plots consisting of all paired variables performs a 
similiar function as the single scatter plot does for bivar- 
iate data. By properly aligning the plots so that a common- 
ality of axis exists between every plot and the adjacent 
plots, we can not only observe the relationships within a 
specific plot but may also follow particular obervations or 
groups of observations through the succesive plots to 
analyse the influence of other variables. This particular 
technique of arranging the scatter plots is similiar to a 
draftsman drawing of a three dimensional object and hence is 
termed a Draftsman display. [Ref. 1 :p.136] 

The three dimensional draftsman display shown in figure 
2.2 consists of the variables of weight, turning radius, and 
engine displacement for 1983 model cars. The first row 
shows the paired plots of weight versus turning radius and 
engine displacement. The second row is turning radius 
versus weight and engine displacement. The bottom row of 
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plots consist of engine displacement versus weight and 
turning radius. This arrangement of the plots, while some- 
what redundant, allows the viewer to scan across rows or 
down columns of plots, thereby matcning up points that 
correspond to the same observations in different plots. 
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Figure 2.2 Three Dimensional Draftsman Display. 

Observing the bottom row of plots in figure 2.2 , we can 
track three distinct groupings of points through all the 
paired plots. These three groups correspond to the small, 
medium, and large size categories of vehicles. A quick look 
at the associations exhibited in this display indicates also 
that engine displacement has a tignter relationship with 
weight than it does to turning radius. Other relationships 
are also evident and are presented in greater detail in the 
analysis presented in Chapter V. 
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B. 



JITTERING OF VARIABLES 



In certaia circuostacces, a scatterpiot may itself be 
visually. deceiving due to the overlapping of data points 
within the plot. This situation nay be particularly preva- 
lent when one or both of the plotted variables have a 
lisiied range of dis crete va lues . In order to alleviate the 
overlapping and enhance the actual relationships that exist, 
a small amount of random noise may be added to one or both 
of the variables to "jitter" their horizontal and vertical 
locations witnin the plot. The amount of random noise added 
or subtracted from the original data values must be suffi- 
cient to prevent overlapping but small enough so that the 
original data values can be recovered by rounding to the 
nearest whole number. Typically the random noise added is 
two to five percent of the total range of the variable 
values. [Ref. 1 :pp. 106-107 ] 

The visual difference resulting from jittering can be 
seen in figure 2.3, where the maintenance records for 1981 

versus 1982 was plotted for 106 automobile models. 
Maintenance is a category variable with values of 0, 1, ..., 
5. Clearly a problem of overlapping exist in the basic 
scatter plot seen on the left. The jittered version on the 
right is a more accurate picture of the distribution and 
clustering prevalent in the data. 
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Figure 2.3 Unjittered and Jittered Plots. 
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C. TRANSFORMATION OF VARIABLES 



The primary pur 
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pose of employing transformations is to 
fy the observed relationsnip between the 
In many instances the plots may be 
ough the use of transformations in order 
r and more understandable picture suit- 
cmparisons and exploration. Hoaglin 
proposes the following pertinent reasons 
ormations. 

terpretation in a natural way. 

try in a batch. 

e spread in several batches. 

ghtline relationships between the vari- 

structure of a two way or higher dimen- 
tructure so that a simple additive model 
n the understanding of the characteris- 
ata. 

transforming the variables is that if 
mation is applied, the resulting scatter 
e linear in form. This in turn visually 
, detection of deviations and outliers, 
ing relationsnips or patterns, 
iscussed, the basic scatter plot of 
splacement is divided into two distinct 
seen in the left plot of figure 2.4 . 
p appears fairly linear, the upper group 
nd curved in shape. The plot on the 
ect of applying a log transform to the 
values. The resulting plot of the 
omes more linear over the entire range 
nt values (see figure 2.4 ). 
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Figure 2.4 An Example of Transformation. 

A note of caution is appropriate in determining when to 
use transformations in the Draftsman display. Since trans- 
formations result in a cna nge to the displayed values and 
scale, care must be taken to avoid confusion during subse- 
quent analysis. We should insure that the benefits of 
describing the data with a transformation is greater than 
the loss of simplicity incurred tnrough its use. 
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III. DRAFTSMAN USER INS TRUCT 10 N S 
A. GENERAL GUIDANCE 

The Draftsman program was written in APL and is designed 
to be used in conjunction with the experimental IBM graphics 
software GRAFSTAT. The Draftsman program is interactive and 
requires little knowledge of APL to use. The APL versed 
user can easily modify the basic program and called subrou- 
tines for more specialized forms of analysis. 

The graphical software which generates the Draftsman 
displays reguires the use of either the IBM 3277GA or 
3278/79 graphic display terminals [Ref. 4]. Normally these 
terminals are available as public facilities with special 
accounts and passwords. Once logged on to one of these 
terminals the user may link back to their own account and 
copy any of their own files as desired. This is useful in 
retrieving data files which the user wishes to analyse with 
a Draftsman display. [Ref. 5] 
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taice a few minutes to familiarize tneaselves with the loca- 
tions of these characters. 

The APL environment will allow the user to copy and 
retrieve both the GRAFSTAT and Draftsman programs as shown 
in figure 3.2 . Once these programs are in the workspace 
the basic set up procedure is complete and the user is ready 
to actually initiate the Draftsman program to produce a 
display. The Draftsman program is initiated by typing 
DRAFTSMAN followed by return. The pcogram will respond with 
a series of terminal queries requesting the various input 
parameters required in the display. Each query is generated 
based upon the user response to the previous query. The 
general program schematic and input requirements is outlined 
in figure 3.1 . 
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OISPLAY 



Draftsman Program Schematic. 
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Figure 3. 1 



I he first option presented is that of inputing a data 
set (figure 3.3). Data which nas been previously copied 
from another API workspace may be entered by variable name. 
Data which is located on a CMS file can be automatically 
read into the workspace. Data in a CMS file can contain 
only numeric characters. A mixture of numeric and alpha- 
betic cnaracters will result in the data not being read in 
correctly. A crucial requirement is that regardless of how 
the data is entered it must be in two dimensional array form 
(rows and columns). The columns of the data correspond to 
the variables, the rows to the different observations on 
each variable. 

Once the basic data has been entered the user is 
presented with an option to have either all of the data or 
only a subsample of the data appear in the display. This 
allows Draftsman displays to be produced on either all the 
data, specified variables, a subpopulation of a variable, or 
any combination thereof (figure 3.4 ). 

Based on the data selected, an option will be presented 
to enter the appropriate names of the variables which will 
appear in the display (figure 3.5 ). These names are the 
labels which will appear on the axis of the plots. The 
variable names can be entered as a previously generated APL 
two dimensional array of characters. If this method of 
input is selected, each row of the array must contain the 
name of a variable in the same order as the variable is 
located in the data structure. The variable names may also 
be entered directly in response to a sequential series of 
queries. Once the variable names are entered, the minimum 
input requirements needed to produce a Draftsman display 
have been completed. The remainder of the queries pertain 
to display enhancements which may be invoked if desired. 

The first enhancement option is that of jittering 
(figure 3.6 ). An input of 0 will result in no jittering of 
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the data. If jittering is desired, the user will he gueried 
as to which variaoles. The results of jittering appear only 
in the Draftsman display and do not permanently alter any of 
the values in the original data set. 

The second enhancement option available is transforma- 
tion of variaDles (figure 3.6 ). Here again, a response of 
0 will result in no transformations occuring. If one or 
more transformations are desired the user is prompted for 
the variables and API expression for tne transformation. A 
summary for some of the more common transformations with 
examples is illustrated in Table I . 



TABLE I 

Sample APL Transformations 



TRANSFORM 


MATH FORM 


APL EXPRESSION 


LOG 


LN X 


®X 


LINEAR 


A x +B 


B+AxX 


CUBIC 


X 3 


X*3 


CUBE ROOT 


x- ,/3 


X*(-1/3) 


SQUARE 


X 2 


X*2 


SQUARE ROOT 


X-./2 


CnT 

1 

•* 

X 



The Draftsman program will begin to display the compo- 
nent scatter plots on the graphics screen. The entire 
display is generated in segments of five variables. At the 

end of each segment an option is offered for the user to 
quit, continue, or to make a hardcopy and continue. 
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Figure 3.2 Accessing the Graphics Programs. 
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Figure 3.3 Data Input Options. 
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Figure 3.4 Data Subsampling Options. 
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Figure 3.5 Variable Labeling Options. 
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Figure 3.6 Display Enhancement Options. 



IV. DRAF TSMA N TECHNICAL IMPLE HE NT ATIO N 



A. EASIC DRAFTSMAN ROUTINE 

The Draftsman program was written m APL, 
array processing language. The use of APL e 
Draftsman program to call and implement a varie 
ting functions available in the IBM GRAFSTAT grap 
ware package. The GRAFSTAT software is an e 
graphics program currently under development by I 
presently available at the Naval Postgraduate 
testing and evaluation purposes. [Ref. 4] 

A secondary benefit derived in using APL is t 
user efficiency characteristics in terms of the 1 
of mathematical operations executable directly 
entries. This approach is ideal for exploring 
tures and features of interest. [Ref. 6] 

The foundation of the Draftsman program revo 
the graphical plotting features of GRAFSTAT, and 
ular the scatter plot option. This option requir 
to input the two variables of interest, size and 
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TYPE(S) OF PLOT : 0 
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Figure 4.1 GRAFSTAT Scatter Plot Function Screen. 



location in which each of the plots will appear for display. 
This methodology produces the entire array of plots while 
eliminating the need for reiterative inputs by the user for 
each plot. The output is displayed as a row of scatter 
plots for eacii variable in the data set. 

For data structures consisting of five or less vari- 
ables, the Draftsman program display will fit on a single 

page. The plotting of five variables per page was selected 

to balance space limitations against the need for sufficient 
clarity of detail within the plots. To accomodate more than 
five variables on a single page would reguire smaller plots 
while reducing the visual usefullness of the display and 
making comparisons inconvenient. Less than five variables 
per page results in the excessive use of costly graphic 
reproducing paper. For data sets exceeding five variables, 
the Draftsman display is generated in segments which when 
reproduced may be pasted together to form a completed 
display. 
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The segmental method of producing Draftsman displays 
enanles the user to display data sets consisting of more 
than five variables. The display procedure is limited only 
by the workspace capacity of the user computing facility. 
The number of segments that will result in a Draftsman 
display can be calculated by squaring the number of vari- 
ables in the data set and dividing by 25. In practice, a 
display cf moce than 15 varianles becomes somewhat unwieldy 
and may negate the benefits of using a Draftsman 
methodology. 

B. TECHNICAL DETAILS OF INPOT REQOIE EH ENTS 

A two dimensional array of data and a two dimensional 
array of the data variable names are the minimum input 
parameters required to generate a Draftsman display. These 
parameters are inputted as prompted by the routine ADMIN. 

Data may be input directly as an APL variable or 
retrieved from a CMS file located on the user's disk. File 
reading is accomplished by CMSREAD [Ref. 6], a library 
routine which has been pre-copied into the Draftsman 
workspace. 

A program entitled SOB was written to assist in the 
restructuring of data sets into more convenient fcprmats. An 
initial analysis of the basic Draftsman display may reveal 
certain variables or sections of data points which warrant 
closer scrutiny. The SOB program allows the user to select 
variables from the original data set as well as subsamples 
of a variable in order to create a new data set entitled 
DATA. DATA becomes the global variable that is actually 
displayed. The APL program SUB which implements this proce- 
dure is found in Appendix A. 

The matrix of variable names is either input directly as 
an APL two dimensional array or is generated by the routine 
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LABELS. The rows of the matrix correspond to eacn of the 
variables in the data set which is to oe displayed. When 
generated by LABELS, each variable name entered containing 
less than 20 characters is padded with blank spaces. If 
more than twenty characters are entered, only the first 
twenty characters will appear on the display. This assures 
tr.at the entire array when passed to succeeding routines is 
a valid rectangular character array. The LABELS routine 
which implements this procedure is found in Appendix A. 



C. ENHANCEMENT ROUTINES 
1 . Jit ter R outi ne 

As discussed in Chapter II, overlapping of plotted 
values may be misleading and inadeguately portray the visual 
relationships exhibited in the data structure. The solution 
is to jitter or add random noise to one or both variables to 
be plotted. This technique is presented as an enhancement 
option to the user and requires only an identification of 
the variables upon which jittering will be performed. 

The jittering of variable points within the 
Draftsman program is accomplished through a method discussed 
by Chambers [Ref. 1 :pp. 106-107 ]. We let Ui, for i= 1 to n ( 
the number of observations ), be n equally spaced values 
from -1 to +1 in random order. The original variable values 
are thus reexpressed in jittered form Ji, 

J; = X. ( e< 3 n ^-1) 



where 0x is .05 times the range of the variable data values. 
This method results in a fractional snift of the data values 
along the same axis in which the variable is plotted. 
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The small shifting of plotted points is sufficient 
to negate the effects of overlaps while preventing any 
serious corruption of the plotted data. It shows the 

multiplicity of points at eacn actual coordinate. The orig- 
inal data values can he recovered oy rounding to the nearest 
integer. Internally the Draftsman program uses the original 
data to create local variables which are jittered and then 
plotted. This enables the user to always maintain the data 
in original form. 

2. Tra nsf orma tion Rout ine 

The potential for transforming variables was written 
as a user option to further enhance the basic Draftsman 
display. This routine maximizes the characteris tics of the 
APL primitive functions as well as parallel array 
processing. The program requires the identification of the 
variables desired to he transformed and the appropriate APL 
expressions for each transformation. The parallel 

processing capability transforms each variable set in one 
operation. As in the jitter routine, the transformation 
routine transforms only local variables for plotting and 
leaves the original data structure intact. 



7. AN AN ALY SIS OF AUTOMOBILE DATA 



A. INTRODUCTION 

An analysis is presented of data consisting of selected 
characteristics of automobiles manufactured during 1983 and 
tested by Consumers Union. The primary purpose of this 
chapter is to demonstrate an application of Draftsman 
displays in exploratory data analysis. The analysis 
initially explores the general descriptive qualities of the 
characteristics of automobiles using the basic Draftsman 
display procedures. Subsequent analysis focuses on observe! 
variables cf interest as developed through the enhancement 
features of Draftsman. 

B. THE AUTOMOBILE DATA 

The data was initially formatted as a two dimensional 
array consisting of 106 rows and 14 columns. Each row of 
the data matrix corresponds to one of the 106 different 
models of automobiles as tested by Consumer Union [Ref. 2 
:pp. 320-356 ]. The columns contain various characteristics 
for each of the automobiles. These fourteen variables 
comprise the three general categories of price, performance, 
and size. The price category consists of the suggested 
retail price of the basic automobile without additional 
options. The performance variables include fuel efficiency 
(city and highway), turning diameter, gear ratio, and 
vehicle repair records for the two preceeding years. The 
size variables consist of length, weight, headroom, rear 
seating space, trunk size, and engine displacement. A 
general variable, automobile, corresponds to each of the 
specific models upon which the data is based. A summarized 
description of the data is shown xn Taole II for reference. 
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TABLE II 






Automobile 


Data Characteristics 




variable 

Automoo f 1 e 
Pr fee 


UNITS 


REMARKS 

1 to 48; small cars 
49 to 98 mid cars 
99 to 106; large cars 




$1000 




HPG Cfty 
MPG Highway 


Miles per gallon 


EPA rated 




Repair 81 




0- not rated 




Repair 82. 




1* very poor 
2- poor 
3* average 
4» good 
5* very good 




Headroom 


Inches 






Rear seat space 


Inches 






Trunk size 


Cubic feet 






Weight 


pounds 






Length 


i nches 






Turning radius 


Feet 






Engine displacement 
Gear ratio 


Cubic inches 




J 



C. PRELIMINARY ANALYSIS 
1 . Ge n era l 

The general Draftsman display of variables was 
generated as discussed in chapter II. A reduced version of 
the basic displays is seen in figures 5.1 through 5.4 . The 
actual Draftsman displays used for analysis may be found in 
Appendix B. For convenience and clarity, individual scatter 
plots from the displays will be xncluded within applicable 
sections of the text. 
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Figure 5. 1 Segment 1 of Automobile Data. 
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Figure 5.2 Segment 2 of Automobile Data 
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Figure 5*4 Segment 4 of Automobile Data 






are 



The general cnar a cteristics of automobiles 
likely to be familiar to most readers. Intuitively we can 
perhaps surmise many of the relationships of the data struc- 
ture without even locking at it. This familiarity however 
will enable us to concentrate more on the features of the 
Draftsman program in exploring the data. Additionally, we 
may confirm intuitive knowledge or perhaps change some of 
our perspectives based upon the analysis. 

2. Cha rac terist ics of Price 

Focusing on price as relating to the other parame- 
ters, the ficst visual message imparted in figure 5.5, is 
that price bands deliniate the major categories of automo- 
biles. Generally the small sized cars are grouped at under 

$10,000 while midsize models are rather tightly grouped 
between $7,500 and $12, 000. If we concentrate on deviations 
from this pattern, the outliers reveal an interesting 
feature. From each major size category to the next there is 
a substancial increase in the number of outliers within the 
categories. These outliers are predominately luxury models 
within their respective categories. 

When price and weight are compared, a gentle upward 
sloping trend dominates, denoting that price and weight are 
positively related, which is to oe expected (figure 5.5 ). 
This relationship levels off at about $10,000. A very 
obvious branch from the main trunk of observations shows 
price increasing relative to weight at a greater rate. The 
presence of this branch suggested additional research to 
determine if a significant parameter was missing from the 
data. The research revealed this uppermost branch consisted 
of luxury style models, with all but one of foreign manufac- 
ture. The majority of the outliers contained between the 
two branches are the luxury models of American origin. We 
might conclude that weight is generally associated with an 
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Figure 5.5 Characteristics of Price. 

increase in price, with the foreign luxury models tending to 
be more expensive than American luxury models of comparable 
weight. 

Similiar upward curving relationships as that 
observed in the scatter plot of price versus weight can be 
seen in some of the other plots in the price row in figure 
5.5 . The plot of price and weight is most closely resem- 
bled by that of price and length. This however may be some- 
what deceiving. A little thought might lead us to conclude 
that these similarities are more due to a relationship 
between length and weight. Consistent with this are the two 
plots containing the parameters of rear seating space and 
trunk size versus price. Although they loosely resemble the 
pattern of the price versus weight plot, we should suspect 
that they are influenced more by the overall dimension of 
automobile length. These plots demonstrate the care that 
must be taken in the analysis of single scatter plots. He 
must be cautious since each scatter plot in the array 
denotes only the isolated relationship of two variables and 
may not necessarily indicate a causal relationship. 
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Figure 5.6 Size and Internal Dimensions. 

The tendency for increased car weight to be associ- 
ated with a related increase in engine displacement is 
another observation we would expect, and is seen in figure 
5.7 . Of significance, is that displacement versus weight 
fall into two distinct types of groups. Vehicles weighing 
up to 3200 pounds have a very tight increasing linear rela- 
tionship with engine displacements up to 175 cubic inches. 
The vehicles of larger weight capacity are seen to be asso- 
ciated with larger engine displacements albeit with a more 
dispersed cluster of points. 

Engine displacement in turn can be seen to have a 
definite co rres ponde nee to the overall automobile catego- 
ries. A close look at the second plot in figure 5.7 reveals 
that small cars tend to be banded with engine displacement 
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Figure 5- 7 Size, Displacement and Vehicle Model. 



from 50 to 150 cubic inches. Noticeable is the one small 
car outlier, identified as the AMC Spirit 6 with a much 
larger displacement of 258 cubic inches. Medium category 
cars are fairly evenly distributed in two bands of engine 
displacement. The lower band spans displacements of 125 to 
160 cubic inches while the upper band is tighly spanned from 
220 to 260 cubic inches in displacement. 

The outliers in the medium car class with signifi- 
cantly larger engine displacements were identified as the 
Chrysler Cordoba V6, Chrysler Cordoba V8, and Lincoln 
Continental V3. Almost all of the larger cars are clustered 
at the 300 cubic inch displacement level with two excep- 
tions. The Buick Electra V6 and Buick LeSabre V6 with 
displacements of 252 and 231 in3 respectively have lower 
displacements. Notwithstanding the outliers, vehicle class 
and engine displacement are very correlated. Overall, the 
deviations of the outliers in figure 5.7 have an inter- 
resting property. They are all of American manufacture and 
either deviate up or down one engine displacement group. 
These traits suggest that these vehicles may have previously 
been in a different size class and changes in some other 
char acteristic features resulted in their being moved up or 
down an automobile class. 
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4 • Veh icl e Perf ormanc e 

Consumers should have a particular interest in the 
fuel efficiency characteristics of automobiles. Not 
surprising is the trend shown in figure 5.3 that correlates 
fuel efficiency in the city vita fuel efficiency on the 
highway. A comparison of the remaining variable plots for 
these efficiency parameters shows identical relationships in 
all cases. The original data structure could probably 
exclude one of these fuel efficiency variables without loss 
of information if we needed to condense the data. 
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Figure 5.8 Fuel Efficiency, Height, and Displacement. 

The inverse relationship of fuel efficiency to 
vehicle weight should not be unexpected and confirms our 
intuition in this regard (figure 5.8). High fuel effi- 
ciency, low weight, and smaller engine displacements are all 
associated. 

As previously mentioned, price alone is not an indi- 
cator of automobile maintainability. An interesting obser- 
vation however can be drawn from the relationship exhibited 
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co the next. 



between repair record from one year co the next. The plot 
of Repair 81 versus Repair 32 in figure 5.9 indicates a 
strong positive correlation between tne two. In almost all 
instances the maintainability does not change for better or 
worse by more than one level. Furthermore, the number of 
automobiles that improved, deteriorated, or did not change 
in terms of maintainability are approximately egual. 






Figure 5.9 Maintainability of Automobiles. 

Repair when compared to engine displacement reveals 
that the predominant number of rated vehicles (i.e vehicles 
for which repair data was available) contained the smaller 
displacement engines. This concentration of better mainte- 
nance values at the lower displacement level suggests that 
smaller engines have better maintenance records. 

D. ANALYSIS WITH ENHANCED DISPLAY 
1 . Gen era l 

The analysis of the basic draftsman display revealed 
a wealth of features pertaining to the individual variables 
within the data. One distinct feature evident is the poten- 
tial relationship between foreign and American manufactured 
automobiles. While not an original parameter of the data, 
the plots of price versus weight and price versus model 
indicates that this influence warrants closer scrutiny. 
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In tha basic display the overlapping of maintenance 
values was alleviated with jittering. The array of plots 
dealing with headroom space suggests that this variable also 
should be treated likewise. The remaining automobile char- 
acteristics do not suffer from any significant problems with 
overlapping values. 

Eased upon the preliminary analysis, an enhanced 
display was generated for subsequent evaluation. The redun- 
dancy of the two fuel efficiency variables was resolved by 
eliminating miles per gallon on the highway. The remaining 
variables were reordered to place those with similiar rela- 
tionships in closer proximity. The enhanced display also 
introduces a new discrete category variable, location of 
manufacture. A value of 1 for this variable corresponds to 
those automobiles produced in America, those vehicles 
produced overseas under an American brand name are denoted 
with a 2, while foreign models are assigned a 3. 

The introduction of location of manufacture dramati- 
cally portrays some very evident dichotomies which exist 
between foreign and American made automobiles. In general, 
the array of plots consisting of these parameters indicates 
a very different orientation on the part of the respective 
manufactures in their approach to the automobile market. 

The potential for transforming the data through 
transformations was considered. Transforming engine 

displacement with a log transform slightly straightens the 
plots containing this variable with respect to some of the 
other size parameters as seen in figure 5.10 . This reex- 
pression however does not really enhance the description of 
the data and hence was not included in the final display. 

The complete enhanced draftsman display may be found 
in Appendix B. Isolated portions of this display will be 

reproduced within this section of the text for clarity. 
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Figure 5. 10 Log Transformation of Engine Displacement. 

2 . Price 

The majority of American and foreign cars are prima- 
rily priced below $12,000. Extremely visible is the large 
grouping of American models in the neignbornood of $9000 to 
511000 as depicted in figure 5.11 . American model prices 
reyond this level tend to increase in a ratner uniform 
fashion of $2000 increments, up to $24000. In contrast, 
foreign car prices are fairly uniformly distributed in the 
region of $5000 to $15000 with subseguent price hikes in 
Larger increments of $5000, to the maximum level of $35000. 




Figure 5.11 Location of Manufacture and Price. 



3. Size 



In general 
3rigin fall within 



automobiles of 
two distinct size 



American and foreign 
ranges (figure 5.12). 
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In terms of tne major diaensions of lengtn ar.d weignt, 
plot arrays provide some distinguishing features contrasting 
location of manufacture. Not very surprising is tnat 
Amer ican ve hides tend to the longer and hea vier side while 
foreign manufactured cars tend to oe shorter and iignter. 
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Figure 5-12 location of Manufacture and Size- 

In view of the propensity for foreign cars to be 
snorter than the American counterparts, the plots of the 
related inner size characteristics shown in figure 5.12 is 
somewhat unexpected. An evaluation of rear seating space, 
trunk size, and headroom shows that the distribution of 
values for foreign produced vehicles is slightly shifted to 
the smaller dimensions in contrast to the respective 
American distributed values. This is consistent with obser- 
vations noted in the basic display- What is unexpected, is 
that the differences based upon the shifts is much smaller 
tnan we might expect given the prevalent difference in 
length distributions between American and foreign cars. 

In conclusion, the rear seat spaciousness of 
American cars is fairly evenly distributed between 25 and 30 
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inches with the large models being oatliers at 32 incnes. 
The foreign models are more widely dispersed from 20 to 2S 
inches. Thus at the upper end of the spaciousness scale 
there is actually only a three inch advantage by the largest 
of the American models. 

The characteristic of headroom denotes a similiar 
relationship as that observed for rear seating space (figure 
5.12). Again, approximately 50% of the foreign car headroom 
values fall within the same distribution range (2 1/2 to 4 

1/4 inches) as that of the vast majority of the American 
models. 

The differences between trunk sizes seen in figure 
5.12 is a bit more acute in that the foreign models range 
from 10 to 14 cubic feet, while the American cars are skewed 
toward 12 to 14 cubic feet. Clearly, in spite of the 
distinct length differences between foreign and American 
produced cars, the differences in internal dimensions is 
much subtler and smaller than we might have originally 
suspected. The foreign cars, althougn smaller in overall 
length , have approximately the same internal size features 
as all but the very largest American made cars. It is also 
interesting to note that the American sponsored but produced 
overseas models tend to exhibit tne characterist ics of the 
foreign models. 

4 • Performanc e 

The distribution of the fuel efficiency characteris- 
tics of American and foreign automobiles appears to be the 
inverse of their weight (figure 5.13 ). Ihe heavier 

American cars tend to be evenly distributed between 9 and 20 
mpg with only three outliers extending beyond 25 mpg. The 
lighter foreign cars, while ranging from 15 to 28 mpg, are 
rather tightly grouped between 20 and 25 mpg. The outlier 
in this case is at the extreme range of 33 mpg. In terms of 
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fuel efficiency, the foreign venicles certainly dominate 
this attribute of performance. 

Perhaps the most revealing plots in the enhanced 
display are those of the repair records. In Doth recorded 

years the American models are rather evenly distributed from 
poorer than average (1) to average (3) maintainability 
ratings. A batter than average rating (4) was achieved only 
four times over both years. In extreme contrast, the 
foreign models during both years show a tendency toward the 
much better than average maintainability rating (5). As in 
the characteristic of fuel efficiency, the foreign models 
dominate this performance variable of maintainability. 
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Figure 5.13 Location of Manufacture and Performance. 



E. CONCLUSIONS 

The car data is an excellent example of how the 
Draftsman display can be used to describe a data set. The 
various parameters associated with automobiles can be very 
confusing to the consumer. No one single parameter can be 
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selected as an overall measure of what constitutes a "best" 
automobile. What one consumer may find desirable, another 
consumer may find unacceptable. Thus, describing or 
modeling the data with more formal statistical techniques 
such as linear regression is not very applicable. The 
Draftsman display enables the user to observe the multivar- 
iate affects of each of the various parameters. Based upon 
the selection of one or more parameters, the user can deter- 
mine the impact relative to other parameters. 



50 



VI. AN ANALY S IS OF CO NIR ACT DATA 

A. INTRODUCTION 

A graphical analysis is presented using tne Draftsman 
display on data collected concerning selected Naval 
contracts signed during the period 1949 through 1963. This 
chapter explores the general descriptive qualities of eleven 
categories of contractual information relative to the 
performance of the contracts. 

The data originally was analysed in a Thesis completed 
in 1973 [Ref. 7], through regression and analysis of vari- 
ance techniques. Significant in this study was the conclu- 
sion that there was no clear method of describing the 
relationships between contract parameters and the subsequent 
performance of the contracts. It is this authors opinion 
that the analysis failed because the use of linear regres- 
sion alone is not sufficient to adequately describe the 
relationships present in tne data. The analysis presented 
based upon a Draftsman display suggests that this method of 
exploratory data analysis reveals a variety of relationships 
do exist describing contract performance relative to the 
contractual parameters. 

B. THE CONTRACT DATA 

The data consist of 177 contracts which comprise all 
Naval aircraft and missile fixed-price incentive contracts 
completed during the period 1949 througn 1963. The data as 
provided by the Naval Material Command encompasses 11 param- 
eters as follows : 

1. Deviation from target cost (percent). 

2. Months to complete contract (months). 
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3. Target profit of manufacturer (percent). 

4. Sharing ratio (percent). 

5. Ceiling price (percent of target price). 

6. Target cost of contract (millions of dollars). 

7. Number of items produced in the contract. 

8. Number of contracts let that year. 

9. Year the contract was signed (see table III ) . 

10. Contractor awarded contract (see table III ). 

11. Type of system (see table ITT) . 



TA BLE 

Description of 

Codes for variable 9 
YEAR SIGNED 

1 - 1949 

2 =» 1950 

15- 1963 



Codes for variable 10 
MANUFACTURER 



1 


Beech 


7 


H 111 er 


2 


LTV 


8 


Kaman 


3 


Corvafr 


9 


Marti n 


4 


Douglas 


10 


McDonald 


5 


Boeing 


11 


N. American 


6 


Grumman 


12 


Vertol 



III 

Variable Coding 

Codes for variable 11 
SYSTEM TYPE 

1 Util fty Airplane 

2 Combat Airplane 

3 Missile 

4 Blimp 

5 Helicopter 

6 Drone 

7 Airborne Equipment 



13 


Ryan 


19 


Phil co 


14 


Sikorsky 


20 


Maxson 


15 


Bell 


21 


Northrop 


16 


Lockeed 


22 


Raytheon 


17 


Bendl x 


23 


Aerojet. 


18 


Gen Elect. 







D. THEORY OF FIXED-PRICE INCENTIVE CONTRACTS 

The concept behind fixed-price incentive (FPI) contracts 
is that they are intended to be used in the development. 
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management support, and production or items in which the 
uncertainty of cost is too great to allow a firm- fixed price 
(FFP) contract attractive to bidders. In theory, the F?I 
contract should motivate control of costs by rewarding the 
manufacturer with a greater profit level as costs are 
reduced below the negotiated target cost. 

The incentive feature of the FPI contract should influ- 
ence the contractors to effectively manage cost associated 
decisions in a manner beneficial to profit. This in turn 
should result in a favorable cost outcome to the government 
as well. This mutually favorable outcome is communicated in 
the form of the sharing ratio which establishes the amount 
of money which will be returned to the contractor for every 
dollar saved below the target cost of the program. For 
example, a 75/25 sharing catio returns 25% of every dollar 
saved to the contractor while reducing the governments 
expected cost by .75 dollars. r ne higher the percent 
returned, the greater the potential for gain or lose to the 
contractor. Hence, lower sharing ratios reflect a greater 
degree of financial risk to the contractor. 

The ceiling price of a contract is a control measure to 
avoid excessive cost overruns to the government. The 
ceiling price establishes the maximum amount of cost which 
will be paid cy the government. Wnen final cost exceeds the 
ceiling cost, the difference must be borne out of pocket by 
the contractor as a loss. Cost outcomes which fall between 
the negotiated target and ceiling values result in a break 
even venture to tne manufacturer. 



D. PRELIMINARY ANALYSIS 

1 . General 

A Draftsman display of the eleven contractual param- 
was generated for preliminary analysis. Host 
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noticeable was that nine of the ele/en variables consisted 
of discrete values which resulted in substantial overlapping 
of plotted points throughout tne display. The plots 
containing the parameter of number of items produced indi- 
cates a problem in scaling. Contracts range in size from 40 
to 1400 items. This problem as shown in figure 1.1 , is 
caused by the eight extreme outlier contracts containing in 
excess of 800 items. The compression of the remaining 
majority of the contracts into a vecy small segment of the 
plots would prevent observations of any meaningful value. 




Figure 1.1 Number of Items per Contract. 

A subsequent Draftsman display was generated to 
alleviate the problem of overlap as well as the scaling of 
number of items per contract. To take care of the overlap 
problem all variables except deviation from target cost, 
target cost, number of items, and manufacturer were 
jittered. To take care of the scaling problem, a log trans- 
formation was used on the variable of number of items. 

The Draftsman segments generated with enhancements 
were reduced and are shown in figures 1.2 through 1.5 . For 
convenience and clarity of discussion appropriate plots will 
he reproduced within the body of the chapter text. The 
original dispLay segments may be seen in Appendix C. 
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igure 6.2 Draftsman Segment 1, Contract Data 




I 
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Figure 6. 3 Draftsman Segment 2 , Contract Data 
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Figure 6.4 Draftsman Segment 3, Contract Data. 
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Figure 6. 5 Draftsman Segment 4, Contract Data. 




2. Cha rac te ri st ics of Size 
a. Volume of Contracts 

As a preliminary expression of size, the number 
of contracts per year provides an estimate for the volume of 
contracts let during a particular period of time. We might 
suspect that a low volume of contracts would create a more 
competitive atmosphere among manufacturers as they attempt 
to maintain their facilities in a production mode. A high 
volume of contracts let offers a greater opportunity for 
manufacturers to select contracts in which they have the 
greatest amount of expertise and experience. The latter 
case has a greater potential for controlling costs as well 
as manufacturer profits. Thus, we might expect that as the 
number of contracts let increasess, the positive deviations 
from target cost should decrease. 

The plot of cost deviation versus number of 
contracts seen in the first plot of figure 6.6 does appear 
to generally support this hypothesis. As the volume of 
contracts increases there is a tendency for cost deviation 
to be negative. In fact, the greater the volume, the 
greater in magnitude the negative cost deviation. 

The cost deviations versus volume relationship, 
when compared over time, also suggest that the time at which 
the contracts were signed may have additional bearing (see 
figure 6.6). The rapid increase in contract volume from 
1949 to 1951 is characteri zed by large absolute deviation 
from target cost (though generally negative). As volume 
declined from 1951 through 1955, the absolute deviations 
from target cost can be seen to be much smaller and roughly 
equally distributed between positive and negative. The 
subsequent volume increase experienced from 19 55 to 1958 
also shows a increase in the absolute deviations from target 
cost (with a fair tendency toward negative deviations) . The 
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me decline fro a 1958 to 1 964 is somewhat more difficult 
nterpret due to the relatively low volume of contracts, 
lute deviations do appear to become slightly smaller as 
me decreases. However, when contrasted to previous 
s of similiar volume, the deviations while becoming 
ler, appear to be doing so to a lesser degree than 
iously. The last three years of the data period, while 
acterized both by low volume as well as a small absolute 
ation from target costs, clearly shows a tendency 
rds positive cost deviation. 
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Figure 6.6 Contract Volume. 



b. Contract Duration 

Based upon production management techniques, we 
t expect that the duration of a contract would have a 
tionship to contract performance. Short term contracts 
e little time for management to adjust production activ- 
s to maximize the efficiency of operations. As contract 
tion increases, a greater opportunity is afforded to 
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contractors to learn by early production errors ana make tne 
cost related decisions necessary for control. When 
contracts duration extends far into the future, difficulties 
can arise by external economic influences which could not be 
accurately forecast at the onset. 
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Figure 6.7 Contract Duration and Performance. 

The isolated plot of deviations from cost rela- 
tive to contract duration reproduced in figure 6.7 reveals 
some interesting features. The contracts of less than 40 
months duration exhibit widely dispersed deviations from 
target cost. For the contracts which lasted between 40 and 
70 months the cost deviations exhibit a clear trend toward 
the negative side. Contracts waich exceed 70 months in 
duration shew an increased deviation that is roughly equally 
split between positive and negative . 

The cost deviation characteristics relative to 
duration noted above appear to hold irrespective of the year 
in which the contracts were signed (see figure 6.7 ). 

Contracts of less than 40 months duration as well as those 
between 40 and 70 months are fairly equally distributed 
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across all years of the data period. Whether contracts 
which exceed 70 months in duration are effected by the year 
signed is not determinable since all but one of these 
contracts occured during the first five years. 

c. Target Cost of Contracts 

The target cost parameter provides a measure of 
the financial size of the contracts let. The plot of this 
variable with respect to deviation from target cost is shown 
in figure 6.8 . The greatest absolute deviation from target 
cost can be observed when target cost is less than 100 
million dollars. In this region, the deviations tend to be 
negative but only by a slight numerical margin. The 
contracts which exceeded a target cost of 100 million 
dollars are clearly seen to exhibit a smaller absolute devi- 
ation from the target cost. These contracts are further 
characterized by generally favoring a negative cost 
deviation. 




Figure 6.8 Target Cost and Performance. 



3 . Incenti v e Me a sures 



a. Sharing Ratio 



The sharing ratio establishes the 
money which will be returned to the contractor 



amount of 
for every 
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dollar saved in cost. This return reflects the potential 
for profit to the manufacturer. Ine higher the sharing 
ratio, the higher the potential gain. This relationship 
appears to be echoed in the plot of tnese two variables as 
seen in figure 6.9 . As the sharing ratio increases so does 
the relative expected profit level of the manufacturer. The 
risk factor associated with contract duration can also be 
ooserved in the sharing ratio. As contract duration 
increases the potential for influence by other external 
economic parameters can less accurately be forecasted. The 
general decline of sharing ratios as contract duration 
increases can be seen in figure 6.9 . This decline may be a 
sign of the contractor's willingness to accept a lower 
maginal profit position in order to decrease risk. 
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Figure 6. 9 The Incentive of Sharing Ratios. 

The most striking observation about the sharing 
ratio is that when evaluated with performance, little if any 
relationship can be determined. We can see no clear indica- 
tion that any particular ratio can be associated with a 
favorable (negative) cost deviation. This lack of relation- 
ship is significant in that the sharing ratio is supposedly 
a major incentive feature of fixed-price incentive 
contracts. A determination that snaring ratios are an 
insignificant parameter suggests that this method of 
contracting might warrant further analysis by the 
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government. It may be interesting to note tnat a Rand study 
of over 400 Air Force contracts resulted in a similiar 
finding that the sharing ratio was insignificant with 
respect to the final outcome of contracts [Ref. 8 p.38]. 

b. Target Profit 

The plots of negotiated target profit level in 
figure 6.10 reveal similiar character istics when compared to 
the contract size parameters. Very evident is that target 
profit in general tends to revolve around the 9% level. 
Oiven the time period during which these contracts were 
performed, 9% represents a rather lucrative profit level. 




Figure 6.10 Target Profit and Size. 
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tends to fluctuate about 2 percentage points above and below 
the 9 % level. The 95? profit level remains firm regardless 
of the number of items. 

Target profit plotted with duration and volume 
shows a slight decline in expected profit levels as either 
duration or volume increase as depicted in figure 6.10 . In 
the former situation this may be a willingness to trade-off 
a lower profit level for the security of a longer term 
production operation. The inverse relationship between 
profit and volume suggests that during low volume periods 
contractors are attempting to make the most out of the few 
contracts available. Conversely when contract volume is 
high the expected profit level declines to about 9%. The 
conclusicn might be drawn that witn more contracts avail- 
able, the contractors are willing to accept a slightly lower 
profit level per contract since the opportunity is greater 
to win multiple contracts during high volume periods. 

As a measure of performance the target profit 
may lack significance in determining the deviation from 
target cost. The plot of target profit versus deviation 
from target cost shown in figure 6. 11 does not suggest a 
describable relationship. The majority of the manufacturers 
tended tc negotiate about a 9% profit level. An anlysis of 
the eight most deviant outliers from this characteristic 
reveals that seven of these outliers were by manufacturers 
with these contracts being their sole participation during 
the entire 15 year period. The comparison of target profit, 
deviation from target cost and system type is also signifi- 
cant with respect to the outliers. While there is nothing 
notable about their target cost, the eleven most unfavorable 
cost outcomes (positive cost deviations) correspond solely 
to three system types. These are combat aircrafts, 

missiles, and helicopters denoted by item types 2, 3, and 5 

respectively in figure 6.11 . 
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Figure 6.11 Target Profit and Performance. 

E. PBELIMI NAEI CONCLUSIONS 

The analysis and discussion presented reveals that an 
abundant amount of information is visible in the Draftsman 
display of the contract data. Further, there are indica- 
tions that there are relationships between the contractual 
parameters and contractor performance. Briefly summarized 
these are: 

1. As the volume of contracts let increases there is a 
tendency for the contracts to result in a negative 
cost deviation (favorable to the government). 

2. Over the 15 year period as volume changed from year 
to year there appears to be a related reaction rela- 
tive to contractor performance. Periods denoted by 
an increasing volume are reflected with an increase 
in cost deviations. When volume declines a related 
decline in cost deviations can also be observed but 
at a more cautious rate. 

3. Contract duration as related to cost deviation might 
better be described in terms of short, medium, and 
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long term contracts. The most favorable contract 
duration appears to be between 40 and 70 months. 
This relationship is fairly consistent regardless of 
the year in which the contracts were signed or the 
volume of contracts let. 

4. Fluctuations in cost deviations tends to stabilize 
when the contracts contain more than 50 items. 

5. As the target cost increases there is a greater 
tendency for negative cost deviation to occur. 
Contracts in excess of 100 million dollars in partic- 
ular resulted in predominantly favorable outcomes. 

6. The sharing ratio as a traditional incentive measure 
of a FPI contract may lack merit. No relationship 
can be observed between this parameter and perform- 
ance. 

7. No obvious relationship can be noted between target 
profit levels and contract performance. 

8. The ultra high technology systems of comDat aircraft/ 
helicopters and missiles exhibit the greatest poten- 
tial for adverse performance. 

F. ADDITIONAL CONFIRMATORY ANALYSIS 
1 . General 

The preliminary analysis using a single iteration of 
tne Draftman display revealed a variety of interesting rela- 
tionships between contractual parameters. Certainly other 
relationships exist which have not been discussed. As an 
exploratory data analysis tool the Draftsman display enables 
the user to look at the data at almost any level of detail 
desired. Subsequent displays can be generated on various 
subpopulations such as each of the manufactures to gain 
greater insignt to their performance behavior. It is this 
versatility in exploring data sets wnich enables the user to 
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rapidly process large amounts ox data in order to gain a 
feeling for the interactions involved. 

The use of the Draftsman display can also assist the 
user in the application of more formal statistical 
approaches. The use of such techniques as regression anal- 
ysis provides a confirmatory measure to the exploratory 
indications viewed in the display. 

The blind application of some statistical packages 
without first looking at the data can result in erroneous or 
misleading conclusions. This is particularly true of large 
data sets where misplaced decimal points, formating errors 
and other related problems may not be easy to detect. The 
visual nature of the Craftsman display can assist in identi- 
fying these problems as well as aid in selecting appropriate 
variable selections on which to initiate formal analysis. 

2. Cost De via tion Ove r Ti me 

One major question which cannot be readily answered 
with the Draftman display of contract data is the relation- 
ship of cost deviations over the entire contract period. 
The scatter plot of percent cost deviation versus year 
signed reveals a wide dispersal in cost deviations with no 
clear visual trend apparent (figure 6.12). 

A least squares linear model was selected in order 
to determine if for all manufacturers a trend exists 
relating cost deviations to the year in which contracts were 
signed. The results of this indicates that in fact an 
upward trend in cost deviations did occur from 1949 through 
1964 (figure 6.12). The computed t-value of 3.3 is quite 
significant and indicates that the probability that the 
value of the coeficient B(1) was actually zero is much less 
tnan .05. The slope of the regression line is . 580 with the 
lower and upper confidence intervals .232 and .927 respec- 
tively. This also clearly supports the upward trend of cost 
deviations. 
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Figure 6. 12 Contract Performance Over Time. 

While an overall increasing trend in cost deviations 
is evident, the performance of the individual manufactures 
involved might reasonably be expected to differ. The gener- 
ation of Draftsman displays for each manufacturer would 
provide the starting point for comparison. While the entire 
displays are not presented, the scatter plots of two major 
contractors (Grumman and Lockeed), indicate how different 
performance results relate in the general cost deviation 
pict ure. 

In applying the linear regression model to Grumman 
as seen in figure 6.13, cost deviations rose rapidly over 
the time period. This rise is much faster than that seen in 
figure 6.12 for all the firms in general. From a government 
perspective this might suggest that a closer scrutiny of 
this company's activities might be warranted. 

The application of the regression model to Lockeed 
as seen in figure 6.14 presents a very different picture. 
In this instance a cubic fit rather than a straight line is 
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Figure 6.13 Grumman Contract Performance. 



nore appropriate in describing the relationship of cost 
Leviations over time. Cost deviations appear almost cycl- 
ical. Particularly noteworthy is the difference between 

Grumman and Lockeed during the last four years of the data 
period. Grumman cost deviations continue to rise while 
Lockeed experienced a sharp decline in deviations. Quite 
Likely there are external considerations which are influ- 
encing cost for each of the manufacturers. 
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APPENDIX a 

DRAFTSMAN COMPUTER CODE 



V DRAFTSMAN $ DATA J NCOL J TR f TC } PI } R ; C } Y } TN f T2N ; XAXIS } TAXI S } X } 

TX J LY J TY ; H POSHJLXf 
ADM I N 

HCOLf“lf(fDATA) 

JITTER 

TRANSFORM 

TR*-~5 

L_OOR4 ; TRfTR + 5 
TCf“5 

LOOP3 ; TCfTC+5 

R 1 «- 0*1 0.82 0.25 0.97 

r<-o 

LOOP2 : RfR+1 

e*-o 

VfDATA[ J (TR+R)] 

loopi ; c^c+i 

XfDATA[J (TC+C)] 

+ ( ( TR + R )=(TC+C) )/5KIP 

POSN4-PI + ((0.18 “0.18 0.18 “O. 18 ) x ( (C-d f (R-i) , (c-i) f (r-d ) > 

XAXISfN[(TC+C ) y J 
YAXISfU[ ( TR + R ) y 2 

+ ( <C=1 )^( (R = 5)v( ( TR+R ) =NCOL ) ) ) /GRARH 
XAXISf' • 

-»(C=1 ) / GRARH 
XAXISfN[ ( TC+C ) ; 3 
TAX ISf ' 

+ ( ( R = 5 ) >✓ ( ( TR + R ) =NCOL ) ) /GRARH 
XAXI Sf TAXI S + 1 ' 

GRAPHJMINMAX 
RUN BASIC 

5K IP J 4( ( (TR + R) ) (NCOL) ) /\ ( < TC + C ) )NCOL ) ) /END 
-► < < C < 5 ) /v ( (TC + C) < NCOL ) )/LOORl 
+ ( ( R < 5 ) /v ( (TR + R) < NCOL. ) )/LOOR2 
END J R AUSE 
ERASE 

+( (TC+C) <NCOL)/LOOP3 
+( (TR+R) <NCOL)/LOOP4 

V 
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Cl 3 


v odminjrzi ;rz 2; rz 3 

• IS TOUR TWO D IMEHSIONflL DATA SET • 


C23 


* ALREADY LOADED IN THIS WORKSPACE?' 


C33 
C4: 
C5 3 
C63 


• ( Y OR N ) * 

^i^-n 

->(RZ1='N' ) /LA 

• WMAT IS THE NAME OF THE DATA SET* 


C 7 1] 
C 8 H 
C93 


DQTAfQ 
-M— 3 

LAJ • TO WAVE YOUR DATA READ INTO * 


C10D 

cii: 


•TWIS WORKSPACE PROM A CMS FILE' 
• ANSWER TWE ROLLOWING QUESTIONS* 


C 123 


DATA 4 -CMSREAD 


C133 


LB ; • DO YOU DESIRE ALL OR TWIS DATA* 


C14] 


• TO BE PRESENTED IN THE DRAFTSMAN » 


C 1 51 


* DISPLAY OR JUST A SUBSAMPLE OR IT?' 


C16D 
C 1 7 1 
C 18 3 
C193 


•ENTER (ALL OR SUB) • 

RZ 2 «-a 

-»(RZ2= ' all • ) / 1 _ c 

SUB DATA 


C203 


LC ♦ • DO YOU WAVE A TWO DIMENSIONAL • 


C213 


• ARRAY OR NAMES ROR TWE DATA • 


C 223 
C 233 
C 24 3 
C 25 3 
C263 


'WWICH IS TO BE DISPLAYED? ENTER <Y OR N ) 

«23<-n 

-* < R-3= ' Y ' ) /LD 
LABELS DATA 
M<-NAMES 


C 273 
C 28 3 


-*o 

LDJ • WWAT IS TWE NAME OR TWE* 


C 29 3 


•ARRAY OR VARIABLE NAMES? • 


C303 


N <-0 




V 



C 1 3 


V SUB MATRIX j VR JVC f Cl j RPOSN f CPOSN ( C5 J RZ J 
* Ef^TER AS A VECTOR THE VARIABLES' 


C 23 
C 3 3 


• (COLUMNS) FROM YOUR DATA SET WHICH' 
'YOU DESIRE TO BE DISPLAYED' 


C43 
C 53 


C IfQ 

•DO YOU DESIRE A S U B P O P UL A T I ON GROUP* 


C 6 3 


'OUT OR ANY ONE VARIABLE?' 


C 73 
C 8 3 
C 9 3 
C 103 
C 1 1 3 


•ENTER ( Y OR N ) ' 

*^i«-a 

( RZ 1 = ' N • ) /LDJ 

•WHAT VARIABLE ( COLUMN > IN THE ORIGINAL' 
•data IS THE SUB-GROUP?' 


C 123 
C 133 


VC4-0 

• ENTER AS A VECTOR THE VALUES OR THE * 


C 143 


• SU BPOPUL A T I ON GROUP THAT YOU WANT' 


C 153 
C 1 6 3 
C 173 
C 1 8 3 
C 1 9 3 
C 20 3 
C 2 1 3 
C 22 3 
C 23 3 
C 2 4 3 
C 25 3 


VR4-Q 

4LD2 

ldj ; vc«-c i C 1 3 
vr 4 -matrix[ ;vc] 

-> L D2 

LD2 l RPOSN 4 - ( MATRIX £ ; VC □ ) gVR 
DATAfRPOSH/Q] MATRIX 
C5f \ C5f"J fCS*-/>MATRIX 
CFOSHfCSj Cl 
DAT Af CPOSN /DAT A 

•THE SUBDATA DESIRED IS A GLOBAL' 


C 26 3 


•VARIABLE CALLED DATA AND HAS A' 


C 27 3 


■SHAPE OR ' f -f. ( ? DAT A ) 
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C13 
C23 
C33 
C43 
C53 
C63 
C 73 
C83 
C93 
C 10 3 
C 1 1 3 
C 123 
C 133 
C 1 4 3 



C 1 3 
C 2 3 
C 33 
C 4 3 
C 5 3 
C 63 
C 7 3 
C 8 3 
C 9 3 
C 103 
C 1 1 3 
C 1 2 3 
C 133 
C 143 
C 153 
C 163 
C 1 7 3 
C 183 
C 193 
C 20 3 
C 21 3 



C 1 3 
C 2 3 
C 3 3 
C 4 3 
C 5 3 
C 6 3 
C 7 3 
C 8 3 
C 9 3 
C 103 
Cl 13 
C 1 23 
C 133 
C 143 
C 1 5 3 
C 163 
C 173 
C 1 8 3 



<7 LOPEL5 DflTftJIDXJI 
IDXflf ( ^ D A T A ) 

' ENTER THE NOME OF EACH COLUMN ' 

• IN ORDER y THE NAME MUST CONSIST* 

•OF NO MORE THAN 20 CHARACTERS* 

• TO INCLUDE PLAHK SRACES * 

• NAME OF COLUMN J T* • 

I<-1 

NAMES*- 1 20 fNAME9f20t(Df ' ') 

LOOP ♦ I *- I + 1 

• NAME OF COLUMN » , ( T I ) f * ? • 

NAMES«-NAMES f C 1 3 (20t(Df * ' ) ) 

-4 ( I C I D X ) /LOOP 

• THE COLUMN LABELS ARE A GLOBAL* 

•VARIABLE CALLED NAMES' 

V 



JITTER £ SIZE $ TEMP fUI JRNGXJXMAXJXMINJRESl JRES2JXJC 
FROHTJREARfPT J U 

X 

•HOW M ANT VARIABLES DO YOU DESIRE JITTERED? ' 

RES 1 *-□ 

-><RESl=0)/0 

ct-o 

•WHAT ARE THE VARS (COLUMNS) TO BE JITTERED? ' 

RES2<-0 

LOOP J ♦ C«-C + 1 

( RES |=| )/PTfRES2 

-4<RES1 = 1 ) /UUMR 
PTfRES2[C3 
UU MR ;XfBATA[ $ F T J 

5IZE(-(fX)-l 

TEMPf(2fSIZE) X((0M5I2E)-(SIZEX0*5)) 

UI(-TEMP[ (fTEMP)?fTEMP] 

RNGXf (XMAXff/X)- (XMINf^/X) 

JXfO ♦ o5xRngx x ui 

XfX+JX 

FRONTfDATft[ * ( * (PT-J) ) 3 

REARf BATA[ y ( ( \ (NCOl FT) ) +PT)] 

DATA*- (FRONT f C23 * ) f C 2 3 FEAR 

-) ( C < RESJ ) /LOO RU 



v transform;re5;c} i ;:<jres2;rear;froht;a 

‘HOW M ANT VARIABLES TO YOU WANT TO HAVE T RT 

RES*-0 

-> ( RES = Q ) /O 
c 4-0 

•WHAT ARE THE VARS (COLUMNS) TO BE TRAN5F 

R e s 2 *- Q 

LOOR A ♦ C*-C + 1 

(RES=1 )/IfRES2 

->(RE5 = 1 ) /UUMR 

I 4-RES2C c 3 
jumr:x*-data[[ r 1 3 

•USING X HAS THE VAR , INRUT THE AF L EXRRESSION* 

•FOR THE TR ANSFORM AT I ON DESIRED ON COLUMN • T + I 

A <-□ 

front *- d a t a[ ; ( \ < 1-1 > > □ 

REARfI'ATA[ J ( ( * ( NCOL- I ) ) -4- I ) 3 
DATA*- (FRONT y C23 «) > [23 REAR 

-4 ( C < R E S ) /LOO F A 
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